import pandas as pd
pd.read_csv(filepath, header=None, delim_whitespace=True)
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torch.nn.functional as F
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Set up on a vertual environment with:
Run the following on terminal
sudo apt update
sudo apt upgrade
sudo apt install libncurses5
sudo apt install libinfo5
sudo apt install libncurses5-dev libncursesw5-dev
You need to register a Xilinx account first, which requires email verification.
After that, go to Xilinx tool download page. Download Xilinx Unified Installer 2022.1: Linux Self Extracting Web Installer.
By default, your download file should be at ~/Download/ directory. Execute it:
chmod +x Xilinx_Unified_2022.1_<>_Lin64.bin
sudo Xilinx_Unified_2022.1_<>_Lin64.bin
This should open up an installation interface. Enter your account and choose download and install now. Kepp all default installation options. Accept every license agreement. The installation should take around five hours. Make sure you have stable internet connection.
Execute sudo nano ~/.bashrc command and add the following line to your file:
source /tools/Xilinx/Vitis/2022.1/settings64.sh
Close and start a new Terminal
Open Vivado with vivado command. At top bar select Help -> Manage Licencce...
Select Obtain License. Select Get Free ISE WebPack, ISE/Vivado or PataLinux Licenses. Click Connect Now.
You should be directed to license generation page. Log in. Select WebPack License and generate it (if you can't find it, press ctrl + '-' to shrink down text). Keep all default settings.
After that, switch to manage license tag. You can either download it or send it to your email.
Once you download your license file (.lic), go back to Vivado license manager. Select load License. Click Copy License.... Select your license file. Click Open.
Once done, youu can select View License Status to check your license.
Be sure to have a licensed Vitis first.
Execute the folowing command in terminal
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
Verify docker engine:
sudo service docker start
sudo docker run hello-world
FINN require Docker to run without root privilege
sudo groupadd docker
sudo usermod -aG docker $USER
After that, restart your virtual machine. Verify your setting:
docker run hello-world
Execute sudo nano ~/.bashrc command and add the following line to your file:
export FINN_XILINX_PATH=/tools/Xilinx
export FINN_XILINX_VERSION=2022.1
export FINN_HOST_BUUILD_DIR=~/finn/build
export PYNQ_BOARD=Pynq-Z2
Via git:
cd ~
sudo apt install git
git clone https://github.com/Xilinx/finn/
cd finn
mkdir build
execute ./run-docker.sh quicktest to see if your settings is correct. You should see at most one error in the end of the test.
Connect your PYNQ board to your virtual mechine. By default, its IP address is 192.168.2.99, its username is xilinx, an d its password is xilinx. Install bitstring library on PYNQ. If your PYNQ board is able to connect to the interenet, run sudo python3.6 -m pip install bitstring on PYNQ; If not, download bitstring tar.gz . Upload it on PYNQ, then execute sudo python3.6 -m pip install <your bitstring tar.gz file>
Execute ./run-docker.sh in your finn directory to launch a finn Docker container. Run the following command on it:
cd ssh_keys
# Keep everything default in the next command
ssh-keygen
ssh-copy-id -i id_rsa.pub xilinx@192.168.2.99
Test with ssh xilinx@192.168.2.99 command. You shouuld be able to log in without password.
Open a new Terminal. go to finn directory. Execute ./run-docker.sh notebook. After about 10 minutes. You should get an URL at the bottom. "Ctrl click" to open finn jupyter interface.
The following are executed on finn jupyter.
Refefrence: end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb. You can go there and run these command for yourself.
Remember to import onnx beffore Pytorch.
import onnx
import torch
Get pre-quantized dataset
! wget -O unsw_nb15_binarized.npz https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1
Get train TenserDataset and test TenserDataset
import numpy as np
from torch.utils.data import TensorDataset
def get_preqnt_dataset(data_dir: str, train: bool):
unsw_nb15_data = np.load(data_dir + "/unsw_nb15_binarized.npz")
if train:
partition = "train"
else:
partition = "test"
part_data = unsw_nb15_data[partition].astype(np.float32)
part_data = torch.from_numpy(part_data)
part_data_in = part_data[:, :-1]
part_data_out = part_data[:, -1]
return TensorDataset(part_data_in, part_data_out)
train_quantized_dataset = get_preqnt_dataset(".", True)
test_quantized_dataset = get_preqnt_dataset(".", False)
Get train Dataloader and test Dataloader
from torch.utils.data import DataLoader, Dataset
batch_size = 1000
# dataset loaders
train_quantized_loader = DataLoader(train_quantized_dataset, batch_size=batch_size, shuffle=True)
test_quantized_loader = DataLoader(test_quantized_dataset, batch_size=batch_size, shuffle=False)
Detect GPU or CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Define MLP model hyperparameter
input_size = 593
hidden1 = 64
hidden2 = 64
hidden3 = 64
weight_bit_width = 2
act_bit_width = 2
num_classes = 1
num_epochs = 10
lr = 0.001
Define quantized MLP model structure
from brevitas.nn import QuantLinear, QuantReLU
import torch.nn as nn
# Setting seeds for reproducibility
torch.manual_seed(0)
model = nn.Sequential(
QuantLinear(input_size, hidden1, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden1),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden1, hidden2, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden2),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden2, hidden3, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden3),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden3, num_classes, bias=True, weight_bit_width=weight_bit_width)
)
model.to(device)
Deffine model training method
def train(model, train_loader, optimizer, criterion):
losses = []
# ensure model is in training mode
model.train()
for i, data in enumerate(train_loader, 0):
inputs, target = data
inputs, target = inputs.to(device), target.to(device)
optimizer.zero_grad()
# forward pass
output = model(inputs.float())
loss = criterion(output, target.unsqueeze(1))
# backward pass + run optimizer to update weights
loss.backward()
optimizer.step()
# keep track of loss value
losses.append(loss.data.cpu().numpy())
return losses
Define model accuraccy calculation method
import torch
from sklearn.metrics import accuracy_score
def test(model, test_loader):
# ensure model is in eval mode
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for data in test_loader:
inputs, target = data
inputs, target = inputs.to(device), target.to(device)
output_orig = model(inputs.float())
# run the output through sigmoid
output = torch.sigmoid(output_orig)
# compare against a threshold of 0.5 to generate 0/1
pred = (output.detach().cpu().numpy() > 0.5) * 1
target = target.cpu().float()
y_true.extend(target.tolist())
y_pred.extend(pred.reshape(-1).tolist())
return accuracy_score(y_true, y_pred)
Define loss funcction and optimizer
# loss criterion and optimizer
criterion = nn.BCEWithLogitsLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=lr, betas=(0.9, 0.999))
Training model
import numpy as np
from sklearn.metrics import accuracy_score
from tqdm import tqdm, trange
# Setting seeds for reproducibility
torch.manual_seed(0)
np.random.seed(0)
t = trange(num_epochs, desc="Training loss", leave=True)
for epoch in t:
loss_epoch = train(model, train_quantized_loader, optimizer,criterion)
test_acc = test(model, test_quantized_loader)
t.set_description("Training loss = %f test accuracy = %f" % (np.mean(loss_epoch), test_acc))
t.refresh() # to show immediately the update
Test and save model state to train later
test(model, test_quantized_loader)
# Save the Brevitas model to disk
torch.save(model.state_dict(), "state_dict_self-trained.pth")
Prepare for Network Surgery
# Move the model to CPU before surgery
model = model.cpu()
Padding the input (593 -> 600) to make later FINN easier
from copy import deepcopy
modified_model = deepcopy(model)
W_orig = modified_model[0].weight.data.detach().numpy()
W_orig.shape
import numpy as np
# pad the second (593-sized) dimensions with 7 zeroes at the end
W_new = np.pad(W_orig, [(0,0), (0,7)])
W_new.shape
modified_model[0].weight.data = torch.from_numpy(W_new)
modified_model[0].weight.shape
Turn [0, 1] input into [-1, +1] input
from brevitas.nn import QuantIdentity
class CybSecMLPForExport(nn.Module):
def __init__(self, my_pretrained_model):
super(CybSecMLPForExport, self).__init__()
self.pretrained = my_pretrained_model
self.qnt_output = QuantIdentity(
quant_type='binary',
scaling_impl_type='const',
bit_width=1, min_val=-1.0, max_val=1.0)
def forward(self, x):
# assume x contains bipolar {-1,1} elems
# shift from {-1,1} -> {0,1} since that is the
# input range for the trained network
x = (x + torch.tensor([1.0]).to(x.device)) / 2.0
out_original = self.pretrained(x)
out_final = self.qnt_output(out_original) # output as {-1,1}
return out_final
model_for_export = CybSecMLPForExport(modified_model)
model_for_export.to(device)
Verify modified model's accuracy
def test_padded_bipolar(model, test_loader):
# ensure model is in eval mode
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for data in test_loader:
inputs, target = data
inputs, target = inputs.to(device), target.to(device)
# pad inputs to 600 elements
input_padded = torch.nn.functional.pad(inputs, (0,7,0,0))
# convert inputs to {-1,+1}
input_scaled = 2 * input_padded - 1
# run the model
output = model(input_scaled.float())
y_pred.extend(list(output.flatten().cpu().numpy()))
# make targets bipolar {-1,+1}
expected = 2 * target.float() - 1
expected = expected.cpu().numpy()
y_true.extend(list(expected.flatten()))
return accuracy_score(y_true, y_pred)
test_padded_bipolar(model_for_export, test_quantized_loader)
Export to FINN-ONNX
import brevitas.onnx as bo
from brevitas.quant_tensor import QuantTensor
ready_model_filename = "cybsec-mlp-ready.onnx"
input_shape = (1, 600)
# create a QuantTensor instance to mark input as bipolar during export
input_a = np.random.randint(0, 1, size=input_shape).astype(np.float32)
input_a = 2 * input_a - 1
scale = 1.0
input_t = torch.from_numpy(input_a * scale)
input_qt = QuantTensor(
input_t, scale=torch.tensor(scale), bit_width=torch.tensor(1.0), signed=True
)
#Move to CPU before export
model_for_export.cpu()
# Export to ONNX
bo.export_finn_onnx(
model_for_export, export_path=ready_model_filename, input_t=input_qt
)
print("Model saved to %s" % ready_model_filename)
View the Exported ONNX in Netron
from finn.util.visualization import showInNetron
showInNetron(ready_model_filename)
Refefrence: end2end_example/cybersecurity/2-import-into-finn-and-verify.ipynb. You can go there and run these command for yourself. Be sure to run the previous part to get the neccessary .onnx files. Then close and halt that notebook beccause Netron visualizations use the same port.
import onnx
import torch
Inport ONNX model into FINN
from qonnx.core.modelwrapper import ModelWrapper
ready_model_filename = "cybsec-mlp-ready.onnx"
model_for_sim = ModelWrapper(ready_model_filename)
Apply tidy-up graph transformation
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs
from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.fold_constants import FoldConstants
model_for_sim = model_for_sim.transform(InferShapes())
model_for_sim = model_for_sim.transform(FoldConstants())
model_for_sim = model_for_sim.transform(GiveUniqueNodeNames())
model_for_sim = model_for_sim.transform(GiveReadableTensorNames())
model_for_sim = model_for_sim.transform(InferDataTypes())
model_for_sim = model_for_sim.transform(RemoveStaticGraphInputs())
verif_model_filename = "cybsec-mlp-verification.onnx"
model_for_sim.save(verif_model_filename)
See model srtucture after transformation
from finn.util.visualization import showInNetron
showInNetron(verif_model_filename)
To verify the model after transformation. Load the dataset
import numpy as np
from torch.utils.data import TensorDataset
def get_preqnt_dataset(data_dir: str, train: bool):
unsw_nb15_data = np.load(data_dir + "/unsw_nb15_binarized.npz")
if train:
partition = "train"
else:
partition = "test"
part_data = unsw_nb15_data[partition].astype(np.float32)
part_data = torch.from_numpy(part_data)
part_data_in = part_data[:, :-1]
part_data_out = part_data[:, -1]
return TensorDataset(part_data_in, part_data_out)
n_verification_inputs = 100
test_quantized_dataset = get_preqnt_dataset(".", False)
input_tensor = test_quantized_dataset.tensors[0][:n_verification_inputs]
input_tensor.shape
Load the model before transformation
input_size = 593
hidden1 = 64
hidden2 = 64
hidden3 = 64
weight_bit_width = 2
act_bit_width = 2
num_classes = 1
from brevitas.nn import QuantLinear, QuantReLU
import torch.nn as nn
brevitas_model = nn.Sequential(
QuantLinear(input_size, hidden1, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden1),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden1, hidden2, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden2),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden2, hidden3, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden3),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden3, num_classes, bias=True, weight_bit_width=weight_bit_width)
)
# replace this with your trained network checkpoint if you're not
# using the pretrained weights
trained_state_dict = torch.load("state_dict.pth")["models_state_dict"][0]
# Uncomment the following line if you previously chose to train the network yourself
#trained_state_dict = torch.load("state_dict_self-trained.pth")
brevitas_model.load_state_dict(trained_state_dict, strict=False)
Adjust input for normal model
def inference_with_brevitas(current_inp):
brevitas_output = brevitas_model.forward(current_inp)
# apply sigmoid + threshold
brevitas_output = torch.sigmoid(brevitas_output)
brevitas_output = (brevitas_output.detach().numpy() > 0.5) * 1
# convert output to bipolar
brevitas_output = 2*brevitas_output - 1
return brevitas_output
Adjust input ffor transformed model
import finn.core.onnx_exec as oxe
def inference_with_finn_onnx(current_inp):
finnonnx_in_tensor_name = model_for_sim.graph.input[0].name
finnonnx_model_in_shape = model_for_sim.get_tensor_shape(finnonnx_in_tensor_name)
finnonnx_out_tensor_name = model_for_sim.graph.output[0].name
# convert input to numpy for FINN
current_inp = current_inp.detach().numpy()
# add padding and re-scale to bipolar
current_inp = np.pad(current_inp, [(0, 0), (0, 7)])
current_inp = 2*current_inp-1
# reshape to expected input (add 1 for batch dimension)
current_inp = current_inp.reshape(finnonnx_model_in_shape)
# create the input dictionary
input_dict = {finnonnx_in_tensor_name : current_inp}
# run with FINN's execute_onnx
output_dict = oxe.execute_onnx(model_for_sim, input_dict)
#get the output tensor
finn_output = output_dict[finnonnx_out_tensor_name]
return finn_output
Compare two models
import numpy as np
from tqdm import trange
verify_range = trange(n_verification_inputs, desc="FINN execution", position=0, leave=True)
brevitas_model.eval()
ok = 0
nok = 0
for i in verify_range:
# run in Brevitas with PyTorch tensor
current_inp = input_tensor[i].reshape((1, 593))
brevitas_output = inference_with_brevitas(current_inp)
finn_output = inference_with_finn_onnx(current_inp)
# compare the outputs
ok += 1 if finn_output == brevitas_output else 0
nok += 1 if finn_output != brevitas_output else 0
verify_range.set_description("ok %d nok %d" % (ok, nok))
verify_range.refresh()
if ok == n_verification_inputs:
print("Verification succeeded. Brevitas and FINN-ONNX execution outputs are identical")
else:
print("Verification failed. Brevitas and FINN-ONNX execution outputs are NOT identical")
Refefrence: end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb. You can go there and run these command for yourself. Be sure to run the previous part to get the neccessary .onnx files. Then close and halt that notebook beccause Netron visualizations use the same port.
launch a build that only generates the estimate reports, which does not require any synthesis
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil
model_file = "cybsec-mlp-ready.onnx"
estimates_output_dir = "output_estimates_only"
#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
shutil.rmtree(estimates_output_dir)
print("Previous run results deleted!")
cfg_estimates = build.DataflowBuildConfig(
output_dir = estimates_output_dir,
mvau_wwidth_max = 80,
target_fps = 1000000,
synth_clk_period_ns = 10.0,
fpga_part = "xc7z020clg400-1",
steps = build_cfg.estimate_only_dataflow_steps,
generate_outputs=[
build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
]
)
%%time
build.build_dataflow_cfg(model_file, cfg_estimates)
Generated report will be in output_estimates_only/report directory
Generate the accelerator. This will take about 10 minutes because multiple calls to Vivado and a call to RTL simulation are involved.
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil
model_file = "cybsec-mlp-ready.onnx"
rtlsim_output_dir = "output_ipstitch_ooc_rtlsim"
#Delete previous run results if exist
if os.path.exists(rtlsim_output_dir):
shutil.rmtree(rtlsim_output_dir)
print("Previous run results deleted!")
cfg_stitched_ip = build.DataflowBuildConfig(
output_dir = rtlsim_output_dir,
mvau_wwidth_max = 80,
target_fps = 1000000,
synth_clk_period_ns = 10.0,
fpga_part = "xc7z020clg400-1",
generate_outputs=[
build_cfg.DataflowOutputType.STITCHED_IP,
build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
build_cfg.DataflowOutputType.OOC_SYNTH,
]
)
%%time
build.build_dataflow_cfg(model_file, cfg_stitched_ip)
We will find the accelerator exported as a stitched IP block design in output_ipstitch_ooc_rtlsim/stitched_ip directory. And different reports in output_ipstitch_ooc_rtlsim/report directory.
Generate PYNQ bitfile and driver. This will take about 15 ~ 20 minutes.
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil
model_file = "cybsec-mlp-ready.onnx"
final_output_dir = "output_final"
#Delete previous run results if exist
if os.path.exists(final_output_dir):
shutil.rmtree(final_output_dir)
print("Previous run results deleted!")
cfg = build.DataflowBuildConfig(
output_dir = final_output_dir,
mvau_wwidth_max = 80,
target_fps = 1000000,
synth_clk_period_ns = 10.0,
board = "Pynq-Z2",
shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ,
generate_outputs=[
build_cfg.DataflowOutputType.BITFILE,
build_cfg.DataflowOutputType.PYNQ_DRIVER,
build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
]
)
%%time
build.build_dataflow_cfg(model_file, cfg
Generated bitfile and .hwh file is loccated in output_final/bitfile directory. The generated Python driver lets us execute the accelerator on PYNQ platforms with simply numpy i/o. It's located in output_final/driver directory. Reports are in output_final/report directory. Finally, we have the output_final/deploy folder which contains everything you need to copy onto the target board to get the accelerator running.
To test the accelerator on the board, we'll put a copy of the dataset and a premade Python script that validates the accuracy into the output_final/deploy/driver folder, then make a zip archive of the whole deployment folder.
! cp unsw_nb15_binarized.npz {final_output_dir}/deploy/driver
! cp validate-unsw-nb15.py {final_output_dir}/deploy/driver
from shutil import make_archive
make_archive('deploy-on-pynq', 'zip', final_output_dir+"/deploy")
You can now download the created zipfile (File -> Open, mark the checkbox next to the deploy-on-pynq.zip and select Download from the toolbar), then copy it to your PYNQ board (for instance via scp or rsync). Then, run the following commands on the PYNQ board (for example, open a Terminal on PYNQ with jupyter) to extract the archive and run the validation:
unzip deploy-on-pynq.zip -d finn-cybsec-mlp-demo
cd finn-cybsec-mlp-demo/driver
sudo python3.6 validate-unsw-nb15.py --batchsize 1000
You should see Final accuracy: 91.868293 at the end.
To see details while running validation, the generated driver includes a benchmarking mode that shows the runtime breakdown:
sudo python3.6 driver.py --exec_mode throughput_test --bitfile ../bitfile/finn-accel.bit --batchsize 1000
cat nw_metrics.txt